Systems and methods for event detection

ABSTRACT

A system accesses a log of events on more than one computing system and scans these logs in an effort to determine the likely cause of various items of interest, events, or problems. These items of interest often include improper or frustrating behavior of a computer system, but may also include delightful or beneficial behaviors for which a user, group of users, company, service, or help desk seeks a cause. Once the likely source of the item of interest is found, a test may be performed to confirm the source of the problem and warning or corrective action taken.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of copending U.S. patentapplication Ser. No. 11/096,659 filed on Mar. 31, 2005, the contents ofwhich are hereby fully incorporated by reference in its entirety

FIELD OF THE INVENTION

The present invention relates generally to systems and methods for eventdetection and analysis. More specifically, this invention relates todetermining causes of concerns encountered by users of computingsystems.

BACKGROUND OF THE INVENTION

Computer use is becoming increasingly complex, as traditional operatingsystems are under continual attack by a panoply of malicious softwareagents including viruses, nonviral “malware,” adware, spyware, and Webbrowser hijackers. Viral and nonviral threats are very serious concernsfor consumers, service providers, help desks, and computer and softwaremanufacturers. Additionally, operating systems may containinefficiencies and errors that cause them to fail when a user runs aprogram or takes other seemingly innocuous actions. Consumer phone callsto help centers regarding spyware adware typically require significanttroubleshooting time. Usually the complaint is that the computer isperforming slowly. Consumers often do not understand the differencesamong adware, spyware, worms, and viruses—and the lack of knowledgecosts Internet service providers significant money.

Problems may arise on certain computer systems as a result of variouskinds of user actions that trigger the installation of malicioussoftware or computer registry changes. For example, a user may browse aweb site, and malicious adware or spyware may then be installed on theuser's system. Normally, a user (or an IT department of a company) doesnot know what web page is the source of the malicious software. Onceknown, it is possible to block or quarantine access to that sitemanually or automatically. The problem may not appear when the uservisits a web site, but might appear when the user clicks a link fromthat web site that redirects the browser to another site, in anonobvious manner, that contains the offending software.

Generally, if a computer expert has knowledge of a sequence of stepsprior to the detection of a problem, knowledge of this sequence of stepscan be use to pinpoint the cause of the problem. However, it is notalways clear as to which one of a number of steps or events prior to aproblem is the true cause of the problem. Thus, it becomes beneficial toexamine the sequence of steps on several or many other systems for whichanother user, or the system itself, determines that a problem hasoccurred. When the problem occurs on more than one machine on a network,a system can query the other machines for the sequence of steps that ledto the problem. The system can then compare and contrast the steps onthese other machines to derive a probable common root cause with highlikelihood.

As the number of problems such as adware is proliferating and computeroperating systems are becoming more complex, a growing need has beenrecognized for providing systems, methods, and services that can mostefficiently and effectively lead users, service providers, companies,help desks, and computer hardware and software manufacturers todetermine likely causes of problems encountered in computing systemssuch as computers, cell phones, PDAs, and other network-connecteddevices.

Computer terrorism, as defined as the act of destroying or of corruptingcomputer systems with an aim of destabilizing a country or of applyingpressure to a government, is also an area of concern which the systemand method can address. Computer terrorism may involve attacks thatmodify the logic of a computing system in order to introduce delays orto make the system unpredictable. Attacks may also include themodification of information that is entering or exiting the system,without the user's knowledge.

SUMMARY OF THE INVENTION

In accordance with at least one presently preferred embodiment of thepresent invention, a system accesses a log of events on more than onecomputing system and scans these logs in an effort to determine thelikely cause of various items of interest, events, or problems. These“items of interest” often include improper or frustrating behavior of acomputer system, but may also include delightful or beneficial behaviorsfor which a user, group of users, company, service, or help desk seeks acause. The term “delightful” may refer to any useful, helpful, orbeneficial items of interest, for example, a system (or software)feature or behavior that a user or group of users finds useful and forwhich the user or group of users seeks a cause. Examples of thesedelightful or beneficial features include: a pleasing sound, image,response, font, keyboard shortcut, mouse behavior, or any usefulsoftware application feature associated with a user's interactions witha computing device. Users may be delighted when a task is easy toperform, if a graphical user interface is pleasing to the eye, if aproblem or frustrating feature improves or is no longer encountered, andwhen the system or software behaves in a useful, efficient,easy-to-understand, or otherwise pleasing manner.

Systems that are included as part of this detection service may utilizea software agent that monitors local events. The events may be gatheredby the agent, or the agent may scan one or more event logs on thesystems to gain access to the event information. The agent monitors anyrequests to share its event information with another computer on thenetwork. In another aspect of the invention, a server may be installedto collect the event information and perform event analysis andcorrelation. Such a server may either be a shared server or a peerserver. In a peer server, there is no dedicated server, but ratherprocess in one or more systems, which when coordinated, can collectivelyperform event analysis and correlation. Examples of software systemimplementing a peer model (distributed computing) are the Sun JTXAframework and the activities of the Global Grid Forum. Additionalinformation on these systems may be found at sun.com and gridforum.org.In a shared server, there is a dedicated server which performs analysisand correlation. This detection service may also be provided for a feeby a service provider remotely from the systems on which the eventsoccurred.

When a problem or item of interest is detected, the agent preferablyqueries other participants located on the local or wide area network forevent information. In one aspect of the invention, the event data may becorrelated locally, while in another aspect of the invention, the eventdata may be sent to a server or more powerful computer system foranalysis and correlation.

Once the likely source of the problem is found, a test may be performedto confirm the source of the problem, and warning or corrective actiontaken. One or more systems on the network may preferably be queried fora recorded set of steps that led up to the occurrence of the problem.

Correlating an item of interest with a particular cause may be doneautomatically, without human intervention, by the detection servicescanning for a common event or action on a plurality of machines priorto an item of interest. For example, if five users accessed a web pagewithin a four-minute time window prior to the observation of intrusivepop-ads, and subsequently their web browsers crashed, then the event ofbrowsing this web page is a likely cause of the item of interest, inthis case, the production of intrusive pop-up ads. In other cases,likely causes of items of interest, such as computer problems, are lesseasy to find. In these cases, it is possible for a separate testcomputer to play back a sequence of events prior to an item of interest,to determine if the item of interest can be replicated. For example, thetest computer can browse to the web site to determine if the pop-up adsare generated after browsing to this site. These kinds of tests orexperiments may be performed in an automated fashion, without humanintervention. These experiments may often concern infection of the testmachine and may be conducted in a controlled and isolated manner on thetest machine so that the entire machine is not infected or renderedinoperable. One way in which to create this isolation is through the useof a virtual machine in which the testing and experimenting is done. Inthis context, a virtual machine provides one or more executionenvironments on a single computer, isolated from one another. The hostsoftware which provides this capability is often referred to as avirtual machine monitor or hypervisor. Through the use of a virtualmachine, which is computer software that isolates the experimentationfrom the rest of the computer, the detection service may test a sequenceof steps without harming the test computer. Once the tests areconducted, the virtual machine can be terminated and any infectionsdiscarded. In this way, the virtual machine may execute the scenariosleading up to the problem. It gathers statistics and attempts tocorrelate the data from two or more systems to pinpoint the cause. Oncethe cause for item of interest (e.g. a problem) is determined, a fix forthis problem may be supplied to the computing systems exhibiting theitem of interest. Alternatively, the computer experiencing the problemmay be “rolled back” to a state prior to the problem occurring. Theconcept of system “roll back” is well known to users of computers andoften plays an integral part in modern operating systems. For example,sometimes a computer user installs a driver that renders a computingsystem unstable. Windows XP allows users to “roll back” a driverinstallation to the previously installed driver. More generally theSystem Restore feature of Microsoft Windows XP enables users, in theevent of a problem, to restore their PCs to a previous state withoutlosing personal data files.

In summary, one aspect of the invention provides a method of eventdetection in computer systems, the method comprising the steps of:detecting an item of concern or item of interest; determining at leastone event near to the item of concern on more than one computer;correlating the at least one event with the item of concern; andthereafter determining at least one probable cause of the item ofconcern.

Another aspect of the invention provides an apparatus for providingevent detection in computer systems, the apparatus comprising: anarrangement for detecting an item of concern or interest on more thanone computer; an arrangement for determining at least one event near tothe item of concern; an arrangement for correlating the at least oneevent with the item of concern; and an arrangement for thereafterdetermining at least one probable cause of the item of concern.

Furthermore, an additional aspect of the invention provides a programstorage device readable by machine, tangibly embodying a program ofinstructions executed by the machine to perform method steps for eventdetection in computer systems, the method comprising the steps of:detecting an item of concern or interest; determining at least one eventnear to the item of concern on more than one computer; correlating atleast one event with the item of concern; and thereafter determining atleast one probable cause of the item of concern.

For a better understanding of the present invention, together with otherand further features and advantages thereof, reference is made to thefollowing description, taken in conjunction with the accompanyingdrawings, and the scope of the invention will be pointed out in theappended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides a block diagram schematically illustrating an embodimentof the present invention.

FIG. 2 is a flow chart showing a mechanism for determining likely causesfor items of interest.

FIG. 3 is a presentation of one preferred embodiment of the event logson more than one computer.

DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention provides a detection service that facilitates theautomatic localizing the cause of items of interest associated withcomputer systems.

With reference to FIG. 1, there is provided in accordance with at leastone presently preferred embodiment of the present invention an agent 102that is installed on a client 101 or server system 103 and that isresponsible for tracking specific events. These events may be caused bysoftware, the user, services provider, company, or group of users, andinclude, for example, such trackable events as mouse events, keyboardevents, browser requests via http and ftp, mail events via SMTP, andvarious other events that could affect the functioning and response of auser's system, such as system 101. Often, the events of most interestwill be those that may affect the system in a negative fashion, such asassociated with adware, spyware, software installations, and viral andnonviral threats. However, these items of interest may also bebeneficial or delightful occurrences for which a user, group of users,company, help desk, or service provider would like to find a cause. Forexample, if user A on a multiuser machine has installed a softwareapplication that makes keyboard key F1 display a useful list of recentlyedited files, user B may be delighted by this and wish to understandfrom where this feature has arisen. A help desk may wish to understandwhy a problem has suddenly gone away, which is another example of a“beneficial” item of interest.

Negative functioning of a computer can also be caused by simply pluggingin a LAN cable, inserting a USB cable, or disabling a wireless password.Each of these kinds of acts can expose the system to threats.Preferably, a log will be created of such events, which can be used bythe agent 102 as part of an overall detection service. The creation ofevent logs is well known in the prior art. For example, the Windows XPoperating system generates an event log for various system, application,and security events. Events are sometimes classified by type such as“information,” “warning,” and “error.”

FIG. 2 is a flow chart showing a mechanism for determining likely causesfor items of interest. In step 210, an item of interest is detected. Forexample, a user may notice that advertisements are popping up in anintrusive fashion. A company's IT department may suddenly noticeunusually high Internet traffic coming from a user's machine. Anautomatic software agent may detect that adware or a virus is on auser's system. In step 210, a detection service agent, such as agent102, scans events on the user's computer, designated as Computer 0 instep 220. In some instances, the correlation of an event (e.g. browsingto a particular web page) will be the obvious cause of an area ofinterest. However, sometimes it may be difficult to determine an exactcause because several events have occurred between the actual cause anddetection of an item of interest. If a likely cause is not determine instep 230, then the detection service agent selects another computer(N+1) 240, and scans the event log on computer N+1, for example a remotecomputer. This process of scanning event logs on other computers isrepeated until a likely cause is determined. If no cause is determined,then the agent may have to wait until more data is available.

In step 250, the agent may take some action. For example, it may reportthe likely cause of an area of interest. It may take a correctiveaction, for example, preventing this problem from happening again. Forexample, if a malicious web site is judged to be the cause of a problem,this web site may be blocked from user access in the future. If a badcomputer driver is determined to be the cause of the area of interest,the driver may be fixed or replaced with a properly functioning driver.

It should be noted that in step 230, it is possible for a separate testcomputer, or even the users own computer 101, to playback a sequence ofevents prior to an item of interest, to determine if the item ofinterest can be replicated. For example, the test computer can browse tothe web site to determine if the pop-up ads are generated after browsingto this site. These kinds of tests or experiments may be performed in anautomated fashion, without human intervention on a virtual machine asalready described.

FIG. 3 is shows event logs 310, 320, and 330 on more than one computer.Each log has a time 340 and associated event 350. The detection serviceagent 102 may have access to these event logs via a network.Additionally, these event logs may be transferred from the variouscomputers to another computer, for example a computer on which the agent102 resides. If an item of interest is detected on a user's computerwith associated event log 330, the agent 102 may determine that event“B”, which occurred at 10:01 on one computer, 10:32 on another computer,and 3:01 on the user's computer is the likely cause, simply because allthree users have reported the item of concern and all three users haveevent B occurring within a half-hour window of time prior to the item ofconcern.

It should be noted that when an item of interest (e.g. a problem,concern, or delight) is detected in step 210, the system agent 102 maytransfer all or a portion of the event stream or logs 310, 320, 330 tooptional server 103 or outside service 104. The server 103 attempts todecode the cause of the item of interest by examining the event list asdescribed. If a number of items of interest are detected by severalusers, the system may assume that a serious item of concern has beenencountered, which may trigger the search for a cause, or highlyprioritize the search when many such searchers are underway. Once thelikely cause has been determined, the server system 103 may take actionas described in step 250, including repairing an infected machine ormachines 101. A list of known causes of problems gradually evolves andmay be available to users, IT shops, companies, groups of users, helpdesks, and to the agent 102. The server 103, client 101, or service 104may maintain such lists and are updated to reflect a new patterns orsignatures of cause of problems. (The combination of the events and theresulting errors may be considered as part of an overall signature.)Potential actions in step 250 also include “rolling back” the systemconfiguration on computer 101 to the time prior to the events thatcaused the problem. It is possible that more than one event 350 isdeemed to be the likely cause of an item of interest. Thus, the term“cause” may refer to a cluster of events that led to an item ofinterest.

Additional theoretical means for determining likely causes for an itemof interest are now discussed. In particular, the arrangement via whichtwo or more sequences of events can be compared and correlated will nowbe discussed. One example of such an arrangement would be a program toanalyze a particular event sequence and hypothesize a finite-stateacceptor for it. A finite-state acceptor is a mathematical abstractionthat can easily be embodied in a program. The subject of finite-stateacceptors is treated in the academic discipline of computationallinguistics, as in the classic text “Introduction to Automata Theory,Languages and Computation,” J. Hopcroft and J. Ullman, Addison-Wesley,1979. A finite-state acceptor is a finite-state automaton with nooutputs. Inputs to this automaton cause it to change state. If the finalstate of the automaton is found to be one of several designated asreflecting success, the input sequence is said to have been accepted.

The creation of a finite-state acceptor that represents a sequence ofevents that is the root cause of a problem (or item of interest) beginswith one such sequence known to have caused the problem (or item ofinterest). A program analyzes that sequence and eliminates events thatare known not to contribute to the creation of the problem. Thiselimination may be done through the application of heuristics, forexample. The program then constructs a finite-state acceptor from thereduced sequence. The finite-state acceptor is then exercised with asecond sequence of events known to have caused the problem. If the finalstate of the acceptor indicates that the sequence caused the problem,then the acceptor is considered to have been tested successfully.Additional testing may be required to improve confidence in theacceptor. If the final state of the acceptor does not indicate that thesequence caused the problem, a random alteration of the finite-stateacceptor is then performed. This alteration may consist of the removalor addition of a single state. If the resulting acceptor works properlyon both the original sequence and on the additional sequence it nowbecomes a candidate for general use.

The random alteration of the finite-state acceptor may be expanded toinclude additional transitions, or multiple state changes. A measure ofdistance may be used to determine how close a given acceptor is todetermining whether a given event sequence has caused a problem, thatmeasure of distance being the number of transitions between the finalstate and the nearest state indicating success.

This invention may be run as a service for a user, group of users,company, or service provider. Fees may be charged based on a number ofcriteria such as: access to other machines (step 240), prioritization ofproblem finding when more than one users seeks a cause, and nature ofaction taken in step 250. The precise nature of items of interestinvestigated, and how fast problems are repaired, may be a function of aservice level agreement and service plan level.

In a further implementation of the present invention, a server canpreferably be configured to collect and store the signatures and errors.The server can optionally query each system on the network for theparticular signature and error code, and if found, implementautomatically implement a corrective action by proactively repairing theerror where it exists on other systems on the network.

As problems are identified and repaired, this information is preferablymaintained and the information is used to form a set of “best practices”which can then by used to populate other machines and inform help deskpersonnel. For example, a web site may be banned, or certain e-mailautomatically discarded, if they lead to the formation of any of thefollowing entities: trojan horses (keyloggers and backdoors, which openup system 101 to attacker's control or use of system 101 to send spamemails), worms (which usually arrive as an email attachment and destroydata), dialers (which change the dial-in number of a modem connection topremium rate numbers causing high phone bills), spyware, adware, andhijackers (which cause Web browsers to behave improperly). Access to thebanned site can be automatically disabled via software or by a firewallconfiguration.

A further aspect of the present invention provides the ability for thecorrelation component to provide proactive measures to combat futureinfections by developing a list of troublesome events and deployingthose event lists to other computer systems to be used as set ofrecommendations or constraints to prevent future problems.

In view of the foregoing, it will be appreciated that there are broadlycontemplated, in accordance with at least one presently preferredembodiment of the present invention, arrangements and methods of eventdetection in computer systems. Preferably, an item of interest may bedetected. At least on event is preferably determined to be correlatedwith the item of interest. Thereafter, at least one probable cause ofthe item of interest is preferably determined.

Among other things, the item of interest may include at least one of:malfunctioning software, slow software, adware, spyware, at least onevirus, corruption of information, defective I/O, defective networkconnectivity, browser hijacking.

In determining at least one event near to the item of concern, at leastone event can preferably be analyzed within a predetermined timethreshold on at least one machine. The at least one machine can compriseat least one of: a user's machine and a machine other than a user'smachine.

A finite-state acceptor can preferably be employed to determine at leastone event near to the item of concern.

Correlating at least one event with an item of concern may preferablyinvolve either or both of: analyzing at least one event and an item ofconcern of a user; and analyzing at least one event and an item ofconcern of another user or user.

The determination of at least one probable cause can preferably involvetesting a corrective method and ascertaining whether the correctivemethod adequately attends to the item of concern on one or moremachines.

At least one event, among other things, could comprise at least one of:at least one mouse event; at least one keyboard event; at least onebrowser requests via at least one of http and ftp; at least one mailevent via SMTP; running software; file creation; file alteration;software installation, hardware installation, at least one signature ofa CPU, disk, I/O, or memory use. Operating systems may containinefficiencies and errors that cause them to fail when a user runs aprogram or takes other seemingly innocuous actions. Thus, events mayarise from normal system use.

It is to be understood that the present invention, in accordance with atleast one presently preferred embodiment, includes an arrangement fordetecting an item of interest, an arrangement for determining at leastone event within a time to the item of interest, an arrangement forcorrelating the at least one event with the item of concern, and anarrangement for thereafter determining at least one probable cause ofthe item of concern. Together, these elements may be implemented on atleast one general-purpose computer running suitable software programs.They may also be implemented on at least one integrated circuit or partof at least one integrated circuit. Thus, it is to be understood thatthe invention may be implemented in hardware, software, or a combinationof both.

If not otherwise stated herein, it is to be assumed that all patents,patent applications, patent publications and other publications(including web-based publications) mentioned and cited herein are herebyfully incorporated by reference herein as if set forth in their entirelyherein.

Although illustrative embodiments of the present invention have beendescribed herein with reference to the accompanying drawings, it is tobe understood that the invention is not limited to those preciseembodiments, and that various other changes and modifications may beaffected therein by one skilled in the art without departing from thescope or spirit of the invention.

1. A computer-method of event detection in computer systems, said methodcomprising the steps of: detecting an item of interest; determining atleast one event near to the item of interest; correlating the at leastone event with the item of interest; and thereafter determining at leastone probable cause of the item of interest.
 2. The method according toclaim 1, wherein the item of interest comprises at least one of:malfunctioning software, slow software, adware, spyware, at least onevirus, corruption of information, defective I/O, defective networkconnectivity, browser highjacking, beneficial system behavior.
 3. Themethod according to claim 1, wherein said step of determining at leastone event is undertaken via analyzing at least one event within apredetermined time threshold on at least one machine.
 4. The methodaccording to claim 3, wherein the at least one machine comprises atleast one of: a user's machine and a machine other than a user'smachine.
 5. The method according to claim 1, wherein said step ofdetermining at least one event is performed via a finite-state acceptor.6. The method according to claim 1, wherein said correlating stepcomprises at least one of: analyzing at least one event and an item ofinterest of a user; and analyzing at least one event and an item ofinterest of another user.
 7. The method according to claim 1, whereinsaid step of determining at least one probable cause comprises testing acorrective method and ascertaining whether the corrective methodadequately attends to the item of interest on one or more machines. 8.The method according to claim 1, wherein the at least one eventcomprises at least one of: at least one mouse event; at least onekeyboard event; at least one browser requests via at least one of httpand ftp; at least one mail event via SMTP; running software; filecreation; file alteration; software installation, hardware installation,at least one signature of a CPU, disk, I/O, or memory use.
 9. The methodaccording to claim 1, wherein the method is implemented on a computernetwork which comprises at least one of: a peer model network, a sharedserver model network.
 10. The method according to claim 1, wherein saidstep of detecting the item of interest is performed in a site remotefrom the item of interest.
 11. The method according to claim 10, whereinthe step of detecting the item of interest is performed by a serviceprovider other than an owner of a machine in which the item of interestoccurred and wherein the detection is performed on a fee-per-servicebasis.
 12. An apparatus for providing event detection in computersystems, said apparatus comprising: an arrangement for detecting an itemof interest; an arrangement for determining at least one event near tothe item of interest; an arrangement for correlating the at least oneevent with the item of interest; and an arrangement for thereafterdetermining at least one probable cause of the item of interest.
 13. Theapparatus according to claim 12, wherein the item of interest comprisesat least one of: malfunctioning software, slow software, adware,spyware, at least one virus, corruption of information, defective I/O,defective network connectivity, browser hijacking.
 14. The apparatusaccording to claim 12, wherein said arrangement for determining at leastone event is adapted to analyze at least one event within apredetermined time threshold on at least one machine.
 15. The apparatusaccording to claim 14, wherein the at least one machine comprises atleast one of: a user's machine and a machine other than a user'smachine.
 16. The apparatus according to claim 12, wherein saidarrangement for determining at least one event is adapted to determineat least one event via a finite-state acceptor.
 17. The apparatusaccording to claim 12, wherein said correlating arrangement is adaptedto perform at least one of: analyzing at least one event and an item ofinterest of a user; and analyzing at least one event and an item ofinterest of another user.
 18. The apparatus according to claim 12,wherein said arrangement for determining at least one probable cause isadapted to test a corrective apparatus and ascertaining whether thecorrective apparatus adequately attends to the item of interest on oneor more machines.
 19. The apparatus according to claim 12, wherein theat least one event comprises at least one of: at least one mouse event;at least one keyboard event; at least one browser requests via at leastone of http and ftp; at least one mail event via SMTP; running software;file creation; file alteration; software installation, hardwareinstallation, at least one signature of a CPU, disk, I/O, or memory use.20. The apparatus according to claim 12, wherein the apparatus isimplemented on a computer network which comprises at least one of: apeer model network, a shared server model network.
 21. The apparatusaccording to claim 12, wherein said arrangement for detecting an item ofinterest is located in a site remote from the item of interest.
 22. Theapparatus according to claim 21, wherein said arrangement for detectingan item of interest is operated by a service provider other than anowner of a machine in which the item of interest occurred and whereinthe detection is performed on a fee-per-service basis.
 23. A programstorage device readable by machine, tangibly embodying a program ofinstructions executed by the machine to perform method steps for eventdetection in computer systems, said method comprising the steps of:detecting an item of interest; determining at least one event near tothe item of interest; correlating the at least one event with the itemof interest; and thereafter determining at least one probable cause ofthe item of interest.